Asymptotic derivation of the finite - sample risk of the k nearest neighbor classifier ∗ ( Technical Report UVM – CS – 1998 – 0101 )
نویسندگان
چکیده
The finite-sample risk of the k nearest neighbor classifier that uses a weighted Lpmetric as a measure of class similarity is examined. For a family of classification problems with smooth distributions in Rn, an asymptotic expansion for the risk is obtained in decreasing fractional powers of the reference sample size. An analysis of the leading expansion coefficients reveals that the optimal weighted Lp-metric, i.e., the metric that minimizes the finite-sample risk, tends to a weighted Euclidean (i.e., L2) metric as the sample size is increased. Numerical simulations corroborate this finding for a pattern recognition problem with normal class-conditional densities.
منابع مشابه
Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data
Kernel density estimators are the basic tools for density estimation in non-parametric statistics. The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in which the bandwidth is varied depending on the location of the sample points. In this paper, we initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...
متن کاملOn the finite sample performance of the nearest neighbor classifier
Abstruct-The finite sample performance of a nearest neighbor classifier is analyzed for a two-class pattern recognition problem. An exact integral expression is derived for the m-sample risk R, given that a reference m-sample of labeled points is available to the classifier. The statistical setup assumes that the pattern classes arise in nature with fixed a priori probabilities and that points ...
متن کاملThe labelled cell classifier: a fast approximation to k nearest neighbors
A k-nearest-neighbor classifier is approximated by a labeled cell classifier that recursively labels the nodes of a hierarchically organized reference sample (e.g., a k-d tree) if a local estimate of the conditional Bayes risk is sufficiently small. Simulations suggest that the labeled cell classifier is significantly faster than k-d tree implementations for problems with small Bayes risk; and ...
متن کاملComparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)
In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...
متن کاملDiagnosis of Tempromandibular Disorders Using Local Binary Patterns
Background: Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment.Material and Methods: CBCT images of 66 patients (132 joints) with TMD and 66 normal...
متن کامل